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ABSTRACT 

DHAtraffic (http://dnatraffic.ibb.waw.pl/) is 
dedicated to be a unique comprehensive and richly 
annotated database of genome dynamics during 
the cell life. It contains extensive data on the no- 
menclature, ontology, structure and function 
of proteins related to the DNA integrity mechanisms 
such as chromatin remodeling, histone modifi- 
cations, DNA repair and damage response from 
eight organisms: Homo sapiens, Mus musculus, 
Drosophila melanogaster, Caenorhabditis elegans, 
Saccharomyces cerevisiae, Schizosaccharomyces 
pombe, Escherichia coli and Arabidopsis thaliana. 
DHAtraffic contains comprehensive information on 
the diseases related to the assembled human 
proteins. DHAtraffic is richly annotated in the 
systemic information on the nomenclature, chemis- 
try and structure of DNA damage and their sources, 
including environmental agents or commonly used 
drugs targeting nucleic acids and/or proteins 
involved in the maintenance of genome stability. 
One of the DHAtraffic database aim is to create 
the first platform of the combinatorial complexity 
of DNA network analysis. Database includes illustra- 
tions of pathways, damage, proteins and drugs. 
Since DHAtraffic is designed to cover a broad 
spectrum of scientific disciplines, it has to be exten- 
sively linked to numerous external data sources. Our 
database represents the result of the manual anno- 
tation work aimed at making the DHAtraffic much 
more useful for a wide range of systems biology 
applications. 

INTRODUCTION 

A comprehensive understanding of the maintenance of 
DNA integrity during the cell life requires the thorough 



characterization of many simple data concerning all 
nuclear processes involving DNA, and including replica- 
tion, repair, recombination (3R) and transcription. The 
major processes that regulate chromatin structure and 
counterbalance its repressive effects are: (i) chromatin re- 
modeling, (ii) post-translational histone modifications and 
(iii) histone replacement. Chromatin is a dynamic struc- 
ture that modulates the access of regulatory factors to the 
genetic material. The main role of DNA molecules is the 
long-term storage of information, genetic instruction used 
in the development and functioning of all known living 
organisms (with the exception of RNA viruses). Cells 
are continuously exposed to damaging agents whose 
action results in modification of nucleic acids. DNA 
damage from endogenous sources gives rise to 20000 
lesions/mammalian cell/day (1). Lesions are also caused 
by errors in DNA metabolic processes, including the for- 
mation of single and double-strand breaks from the 
collapse of replication forks and the introduction of 
modified nucleic acid bases during DNA replication. 
Counting all together, daily the 10 6 -10 18 repair events 
occur in a healthy adult man (10 12 cells) (2). On the 
other hand, DNA damage is also caused by the environ- 
mental factors such as chemicals, UV light and ionizing 
radiation. Also, DNA structure and some proteins 
involved in DNA replication and repair are targets for 
the drugs used during chemotherapy (3). The available 
anticancer drugs have distinct mechanisms of action, 
which may vary in their effects on different types of 
normal and cancer cells. Their role is to slow and hope- 
fully halt the growth and spread of a cancer. 

Across the evolutionary spectrum, living organisms 
depend on high-fidelity DNA replication and recombin- 
ation mechanisms have to response to DNA damage and 
balance between the harmful and beneficial effects of ma- 
nipulation into the genetic code. The knowledge of the 
processes in charge of DNA metabolism is critical to our 
understanding of how and why the genome is affected 
during the lifespan of the organism, and how the DNA 
repair systems efficiently work via several different 
pathways to protect the genome from potential mutagenic 
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modification and allow accurate transmission of genetic 
information (4). Unrepaired lesions or strand brakes left 
in DNA might be the result of dysfunction in DNA repair, 
and lead to aging, carcinogenesis or neurodegeneration 
(5,6). Some pathological disorders are directly related to 
defects in DNA repair, telomere maintenance or DNA 
damage response machinery (7-9). At the same time, 
random changes in DNA are viewed as a main source of 
genetic variability (e.g. the antibody production), and thus 
a driving force for evolution. A precise coordination of the 
genome networks is crucial to ensure that the correct 
genetic code is maintained within the genome. 

Traditionally, most of the known information on the 
DNA study, DNA damage, diseases and drug targets 
has been resisted in books, journals and databases. 
Moreover, research in molecular biology has been 
focused on single problems, simplified to the maximal 
extend. Recently, the holistic approach to research, 
referred to as the systems biology, has gained importance 
and interest in the scientific community. Arising databases 
mainly contain information on the sequenced genomes, 
genes, proteins, RNAs, etc. Also, the depositaries of in- 
formation on drugs, small molecules and chemicals are in 
common use. The topic of the DNA metabolism is 
covered by many computational resources. Metabolic 
pathway databases contain metabolic pathways from a 
wide variety of organisms (10-17). Those databases 
queried about 'DNA metabolism', 'DNA replication', 
'DNA repair', 'nucleic acid' show several dozen answers. 
However, the chromatin maintenance network contains 
about 20 subpathways, depending on the organism [e.g. 
Escherichia coli cells lack of non-homologous end joining 
(NHEJ) repair or Fanconi anemia (FA) pathway]. 

In contrast to others, the open access DNAtraffic 
database is a richly annotated resource for systems 
biology of DNA research containing information on: 
(i) DNA metabolism (replication, transcription, DNA 
repair pathways, chromatin organization, histone modifi- 
cations and the DNA damage response network in eu- 
karyotic and prokaryotic organisms); (ii) proteins 
enrolled in widely understanding the DNA metabolism; 
(hi) DNA damage (damage type, damage source and 
damage effect); (iv) diseases related to the assembled 
human proteins and (v) drugs targeted on nucleic acids 
metabolism and proteins involved in the maintenance of 
genome stability. 

DNAtraffic database for systems biology of genome in- 
tegrity is addressed to scientists, pharmacologists and 
students. 



DETAILS RELATING TO DNAfraff/c's OVERALL 
DESIGN AND DATA STRUCTURE DEPICTION 
CONVENTIONS 

The aspects of the biochemistry and molecular biology of 
the genome dynamics during the cell life are the key for 
learning genome stability networks. During DNA replica- 
tion, transcription and DNA repair, the cellular 
machineries performing these tasks need to gain access 
to the DNA that is packaged into chromatin or 



nucleoid. The main aim of the DNAfra^zc database is to 
cover and elucidate the interdisciplinary knowledge 
linking all aspects of the DNA integrity processes (e.g. 
chromatin dynamics, DNA replication, damage signaling 
and DNA repair), DNA damage and drugs interacting 
with DNA or proteins directly enrolled in DNA metabol- 
ism and connect all pieces together for the coordination of 
steps within a pathway or for crosstalk between different 
pathways. As transcription, recombination and DNA in- 
tegrity are central components in the evolution of recent 
genome structures, and because replication, recombin- 
ation and repair (3R) were fundamental prerequisites for 
the origin of life, all these topics are taken under analysis 
and serve as the cohesive force underlying this comprehen- 
sive DNA topic-focused database (18). 

PathCARD 

We used KEGG (13) and Reactome (12) databases for 
data implementation about pathways and networks con- 
cerning DNA metabolism. Some data like prokaryotic 
SOS response and translesion synthesis (TLS) were 
directly added by our DNAtraffic team. All proteins are 
classified according to the orthology class, and next to the 
DNA integrity networks: chromatin organization and 
histone modifications, replication, damage checkpoint, 
DNA repair, modulation of nucleotide pools and so on 
(Table 1). It must be emphasized that all described 
processes are tightly connected to each other and they 
act in concert sharing some steps and/or proteins. 
Known functions of proteins are indicated in the curator 
comments section of each entry. A special emphasis is 
devoted to the function of that protein within DNA me- 
tabolism pathways but we also refer to alternative roles in 
other pathways. Additionally, all Gene Ontology terms 
associated to that protein are listed. The pathway in 
which a given protein is playing a role is also explored 
by linking from DNAtraffic to the pathways included in 
the KEGG and Reactome databases. 

ProteinCARD 

According to the DNA metabolism network we used the 
UniProt (19), KEGG (13) and National Center for 
Biotechnology Information (NCBI) databases for 
protein data implementation into DNAtraffic database 
for eight model organisms commonly used for DNA 
study. We collected 2921 proteins, for example — 582 for 
Homo sapiens, 277 for Saccharomyces cerevisiae and 91 for 
E. coli (as of 13 October 2011). Using direct access from 
DNAtraffic to protein all users can obtain unusual view of 
well-known proteins from model organisms but classified 
into the orthology classes. This innovation may be useful 
for the systems biology research and proper selection of 
the model organism for further study (Figure 1) of selected 
pathway. Amino acids and DNA sequences were down- 
loaded from Ensemble. When available, links to the 
protein 3D structure in Protein Data Bank (PDB) were 
provided and 2D picture is visible in the single 
ProteinCARD entry. If annotated, possible physical inter- 
actions with other proteins were obtained through IntAct, 
STRING and other databases providing interacting 
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Table 1. Distribution of the orthology classes into DNA maintenance 
network in DNAtraffic database 
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protein pairs from small and large-scale experiments. 
Also, manual annotation work was needed to match the 
DNA damage or drug to appropriate protein and DNA 
structure. Also, the DNA metabolism-related proteins 
from the DNAtraffic database were classified by orthology 
into the functional (or predicted by orthology) activities 
such as DNA polymerase, DNA ligase, DNA glycosylase, 
DNA helicase, nuclease, etc. This action also needed 
manual annotation work. Using our knowledge and bio- 
informatics tools, in near future, protein will be classified 
into the structural families attending to the presence of 
characteristic domains, e.g. BRCA1, BARD1, BRCT 
and RING, etc (20,21). Each protein possesses its own 
ProteinCARD entry with a succinct description, recipro- 
cal links to pathway(s), and if existing — to disease 
and DNA damage, and additional external links to 
NCBI, KEGG and UniProt databases. Moreover, from 
single ProteinCARD user can overview and access to the 
other proteins from the same orthology class. This infor- 
mation can be useful for the systemic study of DNA 
integrity. 

DiseaseCARD 

Disease is a condition, which arises in a living organism, 
animal or plant, when something malfunctions and 
impairs the normal operation of the organism. 
Developing an understanding of the factors that cause 
disease motivates most of biological research. Till now, 
DNAtraffic collects 121 diseases related to dysfunction 
in 77 proteins enrolled to the DNA networks. Data were 



implemented from OMIM (22) and KEGG databases (13) 
as well as directly from PubMed. This action needed 
manual annotation work. Each disease possesses its own 
DiseaseCARD entry with a succinct description, link to 
protein(s) and sometimes the picture of the symptoms. 
Reciprocal links to diseases are also available in each 
protein and pathway field (Figure 1). 

DamageCARD 

As of 13 October 2011, we collected information about 
146 different types of damage in the DNA. Many of 
them describe general classes of damage events such as 
methylation or oxidative damage, or single-strand breaks 
or base loss, which are independent of the local sequence. 
About 50 chemical compounds that cause DNA damage 
were connected to the appropriate types of damage. Each 
type of damage is described on its own DamageCARD 
entry that includes information about the potential 
source (e.g. spontaneous formation, intermediate in 
some DNA repair process, methylating agents, etc.), 
proteins that may recognize its presence in the DNA, 
keywords that facilitate analyzing its context and 
external links (if available) to: PubChem Compound 
(CID), PubChem Substance (SID), ChemSpider, KEGG 
Compound, ChEBI and ChEMBL. DNAtraffic database 
also displays the unique chemical structures of DNA 
lesions in 2D and provides atomic coordinates for 
download in the smiles, InChi and InChiKey format. 

DrugCARD 

Till now, we collected information about over 181 differ- 
ent types of drugs interacting with DNA or proteins 
involved in nucleic acids metabolism. Data were imple- 
mented from DrugBank, T3DB, Therapeutic Target 
Database (TTD), KEGG Compounds databases 
(13,23-25). Each type of drug is described on its own 
DrugCARD entry that includes information about the 
potential application (e.g. anticancer treatment, DNA 
topoisomerase inhibitor and other), drug-protein or 
drug-DNA interaction and external links to DrugBank, 
KEGG Compound, PubChemCompound, PubChem 
Substance, ChemSpider, ChEBI, ChEMBL and TTD 
databases. DNAtraffic database also displays the unique 
chemical structures of drugs in 2D and provides atomic 
coordinates for download in the smiles, InChi and 
InChiKey format. 

SCHEME OF THE DNAtraffic DATABASE 
ARCHITECTURE 

The unordered data are difficult to interpret and many of 
the connections are lost. The OWT ontology provides the 
clear view and discovers the new connections. DNAtraffic 
database has been implemented using the Django web 
framework (http://www.djangoproject.com/). It uses a 
PostgreSQL relational database to store data (http:// 
www.postgresql.org/). Scripts are written in Phyton 
language. DNAtraffic database is freely available and 
can be accessed at http://dnatraffic.ibb.waw.pl/ 
dnatraffic/. 
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Figure 1. Scheme of DNAtrqffic database. 



EXPANDED DATABASE LINKAGES 

Because DNAtrqffic was designed to cover a broad 
spectrum of scientific disciplines, it must be extensively 
linked to many external databases. Until now, 
DNAtrqffic contains up to 15 database hyperlinks 
including links to KEGG (13), UniProt (19), OMIM 
(22), PDB (26), PubChem (27), ChEBI (28), ChEMBL, 
GenBank (29), Pfam (30), GeneCards (31), GenAtlas 
(32), HGNC, PubMed, ChemSpider (33) and TTD (25). 



CONCLUSION 

Researchers of the various chromatin structure and DNA 
repair processes have recently embraced approaches in 
which global measurements of gene expression and the 
proteome can be combined with genome-wide screening 
of sensitivity mutants to develop an integrated view of 
how cells respond to and protect themselves against 
DNA damaging agents. The emerging picture from these 
global genomic studies is quite different from the previous 
concept of DNA repair, cell cycle control and induction of 
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apoptosis as being independent processes. In fact these 
processes appear to form a fully integrated network. 
Integration of these genome-wide measurements allows 
the development of specific models of response networks 
that could not have been detected or discerned previously. 

DNAtraffic database is the first platform for systems 
biology of DNA integrity during the cell life, and can be 
also integrally involved in translational research (18). This 
includes the identification of small molecule inhibitors of 
novel DNA damage response (DDR) pathways that put 
new light on the causes of cancer or have potential uses in 
treatment. 

DNAtrqffic contains a significant number of data. 
As highlighted throughout this article, numerous improve- 
ments have been made in the quantity, quality, depth and 
organization of the information provided. DNAtraffic 
contains illustrated DNA networks in the cell, protein, 
damage and drug structures data and pictures. 
DNAtraffic also offers expanded database links. It is 
hoped that DNAtraffic will continue to develop to fulfil 
the needs of its users and provide an increasingly useful, 
information-rich DNA metabolism resource. 
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